MYSQL again!

meghatronic · 25 Nov 2006 at 17:47

Okidoke. Another question.... I thought I'd start a new thread as it is different to my other things.

I have a data file formatted like this with 812 lines (each line has NP_xxxxxx.x at the start):

Code:

NP_004125.3	0006457	0003674	0008219	0005737	0005515	0008152	0016265	0005488	0005739	0008150	0005623	0043067	0044260	0006916	0044424	0005575	0050789	0044444	0050791	0017076	0043118	0048523	0043069	0050794	0051244	0009986	0019538	0051243	0005524	0051082	0042981	0043229	0043226	0006915	0044237	0044464	0043227	0005622	0043231	0009987	0007582	0000166	0043066	0043170	0044267	0012501	0048519	0030554	0050875	0044238
NP_001311.3	0030234	0042802	0005956	0003674	0005515	0007165	0005488	0008150	0008605	0019207	0005623	0044424	0005575	0016055	0003824	0019887	0007166	0044464	0005622	0016772	0009987	0007154	0016740	0016301	0043234
NP_001605.1	0005200	0003674	0005515	0044422	0005488	0015629	0005623	0044424	0005575	0017076	0005198	0043232	0044430	0005884	0005524	0005856	0043229	0043226	0044464	0005622	0043228	0044446	0000166	0030554

Now, I want to read this file and count the number of occurences of each number, e.g. just in above "0003674" occurs thrice - once on each line.

I'd ideally like to output a table like the following:

Code:

*number*	*name*		frequency*	*NPs*
------------------------------------------------------------------------------------
0003674		{mysql lookup}		3	NP_004125.3/NP_001311.3/NP_001605.1
0042802		{mysql lookup}		1	NP_001311.3

And so on.... I'll sort the mysql lookup later to fetch the name.

At the moment I'm using:

Code:

<?php
$lines = file('data.txt');
foreach ($lines as $line_num => $line) {
	echo "" . substr_count($line, '0003674') . "<br />\n";
}
?>

And this returns the number of times the term occurs per line - which isn't really what I want plus I have to declare each term manually.

Any help or pointers would be greatly appreciated!

meghatronic · 26 Nov 2006 at 17:20

Anyone got any ideas at all? It has me stumped..

I can get it to read etc.... But can't count and so on.

LazyManc · 26 Nov 2006 at 23:21

Try something like this:

Code:

foreach($lines as $line){  
   // make an array of all the numbers on a line
   $numbers = explode(' ',$line);
   // make a new array, with the number as the key, and a count as a value
   foreach($numbers as $number){
      // if the number already exists in the array, increment
      if ($count[$number]){
         $count[$number]++;
      } else { // otherwise create that array element
         $count[$number] = 1;
      }
   }
}

$count should now be an array of all the numbers which occured in the file as keys, and a count of their frequency as a value.

meghatronic · 26 Nov 2006 at 23:57

LazyManc said:
Try something like this:

Code:

foreach($lines as $line){ // make an array of all the numbers on a line $numbers = explode(' ',$line); // make a new array, with the number as the key, and a count as a value foreach($numbers as $number){ // if the number already exists in the array, increment if ($count[$number]){ $count[$number]++; } else { // otherwise create that array element $count[$number] = 1; } } }

$count should now be an array of all the numbers which occured in the file as keys, and a count of their frequency as a value.

Cheers!

How would I best output this?

LazyManc · 27 Nov 2006 at 16:27

How you output it is up to you, and depends on your situation. If its going to be in a web page, then using a <table> should be fine.

Just start the table, then loop through the $count array, echoing a new row for each key=>value pair, then close the table.

i.e.

Code:

<table>
   <tr><th>Number</th><th>Frequency</th></tr>
<?php
foreach($count as $number => $frequency){
   echo "<tr><td>$number</td><td>$frequency</td></tr>";
}
?>
</table>

meghatronic · 27 Nov 2006 at 19:02

LazyManc said:
How you output it is up to you, and depends on your situation. If its going to be in a web page, then using a <table> should be fine.

Just start the table, then loop through the $count array, echoing a new row for each key=>value pair, then close the table.

Excellent - at least that's started me off! One quick question.

I needed to use explode with \n and \t to sort the data (format as in OP)... and then ereg to check it's numeric (and not include the NP_xxxxxs's) was this neccesarry? From what you were saying I thought it would just rip out all the numbers - have I killed it?

Edit - this sounds tricky, but I now have 3 columns (from OP: *number*, *name* and *frequency*) thanks to your help and a sneaky MYSQL read. However, i'm stumped about how to populate the *NPs* row. Presumuably I could reread the file with each 'number' in the array and mark the first 11 characters from the row each number appears on? More info in OP.

Cheers!

LazyManc · 28 Nov 2006 at 00:47

First off, there's no point using file_get_contents to read the file into a string, and then explode it into an array - let php do the work for you and use file() which reads each line into an array.

If the format of the file is always going to be like in your first post, i.e.

Code:

NP_004125.3	0006457	0003674	0008219	0005737	0005515	0008152	0016265	0005488	0005739	0008150	0005623	0043067	0044260	0006916	0044424	0005575	0050789	0044444	0050791	0017076	0043118	0048523	0043069	0050794	0051244	0009986	0019538	0051243	0005524	0051082	0042981	0043229	0043226	0006915	0044237	0044464	0043227	0005622	0043231	0009987	0007582	0000166	0043066	0043170	0044267	0012501	0048519	0030554	0050875	0044238

and the bit between the NP_xxx and the first number is a tab char, then before you do the explode, just create 2 strings, one containing the NP_xxx number, and one containing all the space delimited numbers, then do the explode on the latter.

Code:

// note: this code will be inside the loop which iterates through the $lines array

// get np_xxxx (everything up to tab)
$np = strtok($line,"\t");
// get everything from the tab onwards then dump the tab
$numberString = trim(strstr($line, "\t"));
// make the array of numbers
$numbers = explode(' ',$numberString);

then you can generate a multidimensional $count array in a format like this:

Code:

array (
          [000123] (
                        frequency  = 3
                        NPs (
                              NP_xxx
                              NP_yyy
                             )
                       )
          [000124] (
                       frequency = 1
                       NPs (
                              NP_zzz
                            )
                       )
          )
etc...

*edit* missed a param on explode

meghatronic · 30 Nov 2006 at 13:09

Seeing as I'm full of questions at the moment and you guys seem to always know the answer... a bit of a thread revival methinks!

My googling has failed me.

I have the following text file (TAB seperated):

Code:

NP_004125.3	0006457	0003674	0008219	0005737	0005515	0008152	0016265	0005488	0005739	0008150	0005623	0043067	0044260	0006916	0044424	0005575	0050789	0044444	0050791	0017076	0043118	0048523	0043069	0050794	0051244	0009986	0019538	0051243	0005524	0051082	0042981	0043229	0043226	0006915	0044237	0044464	0043227	0005622	0043231	0009987	0007582	0000166	0043066	0043170	0044267	0012501	0048519	0030554	0050875	0044238
NP_001311.3	0030234	0042802	0005956	0003674	0005515	0007165	0005488	0008150	0008605	0019207	0005623	0044424	0005575	0016055	0003824	0019887	0007166	0044464	0005622	0016772	0009987	0007154	0016740	0016301	0043234
NP_001605.1	0005200	0003674	0005515	0044422	0005488	0015629	0005623	0044424	0005575	0017076	0005198	0043232	0044430	0005884	0005524	0005856	0043229	0043226	0044464	0005622	0043228	0044446	0000166	0030554

And a mysql db version that has: np|goterms(SPACE seperated).

I made a serious boo boo when I generated this data and need to check something. There are three numbers that MUST be on each and every line. Every line in the flat file and ever column in the db.

How can I check the contents of the line, and if any of these three terms aren't found on the line, they're added to the end of the line.

Likewise, how can I check to see if for each row, the goterms field contains each of these three numbers, and if not they're they're added to the end of the field.

Thanks!

edit - the db and the flatfile contain the same info... so if it is easier to change just the one and update the other I could do that.

edit2- I have the following so far..

Code:

<?
$key = "0003674";
//load file into $fc array
$fc=file("GO_ancestors_for_Acc.txt");
//open same file and use "w" to clear file 
$f=fopen("00.txt","w");
//loop through array using foreach
foreach($fc as $line)
{
     if (!strstr($line,$key)) //look for $key in each line
//something?

$replace = $line. "\t" .$key. "\n";
$cat = str_replace($line, $replace, $line);

fputs($f,$cat);
}
fclose($f);
?>

But it naturally only outputs the ammended lines!

LazyManc · 30 Nov 2006 at 13:51

Assuming you imported the file version with some other script, then just change the file version, truncate the table and re-import it from the (new) file.

jdickerson said:

Code:

<?
$mustHaves = array("0003674","0004567","0001234");
//load file into $fc array
$fc=file("00.txt");
//loop through array using foreach
foreach($fc as &$line)
{
     foreach($mustHaves as $number){
          if (!strstr($line,$number)) {
               $line .= "\t$number";
          }
     }
}
// write to file
?>

Make sure you pass the $line by reference (&$line) cause otherwise the changes will be lost when you exit the foreach.

meghatronic · 30 Nov 2006 at 15:06

Right, I've just noticed something.

I'm using the following to 'count' the frequency, i.e. the number of times a 'term' is repeated in the file. I thought it had worked but at least the top three results are 'wrong'. Anything wrong with the code?

Code:

<?php
$filename = 'file.txt';
$lines = file($filename);

foreach($lines as $line){  
   // make an array of all the numbers on a line
   $numbers = explode("\t",$line);
   // make a new array, with the number as the key, and a count as a value
   foreach($numbers as $number){
      // if the number already exists in the array, increment
      if ($count[$number]){
         $count[$number]++;
      } else { // otherwise create that array element
         $count[$number] = 1;
      }
   }
}

arsort($count);
foreach($count as $number => $frequency){
	echo "GO:$number <b>$frequency</b><br>";
	}
?>

edit, I've actually printed mysqlnumrows vs. 'frequency' to compare and the values are indeed 'off'.

LazyManc · 30 Nov 2006 at 17:11

Try changing

Code:

$numbers = explode("\t",$line);

to

Code:

$numbers = explode("\t",trim($line));

... it appears to be an issue with the new line / carriage return at the end of each line - trim will remove any whitespace at the start and end of $line.

meghatronic · 30 Nov 2006 at 17:14

LazyManc said:
Try changing

Code:

$numbers = explode("\t",$line);

to

Code:

$numbers = explode("\t",trim($line));

... it appears to be an issue with the new line / carriage return at the end of each line - trim will remove any whitespace at the start and end of $line.

I'm going to kiss you.

Cheers!

meghatronic · 30 Nov 2006 at 23:01

Absolutely no point starting a new thread!

Is there anything blindingly wrong with this? (sloppy coding aside):

Code:

<?php
include("db.php");
$lines = file("id.txt");
?>

<table>
<th>id</th><th>swiss acc</th>

<?
foreach($lines as $line){  
	$id = trim($line);
	$query="SELECT * FROM `cheese` WHERE `id` = $id";
	$result=mysql_query($query);
	$num=mysql_numrows($result);

	if ($num > 0) { 
		$i=0;
		while ($i < $num){
			$ids=mysql_result($result,$i,"swiss");
			echo "<tr><td>$line</td><td>$ids</td></tr>";
			}
		$i++;
	}else{
	}
}
?> 
</table>

The abbridged version works, but isn't as useful, as there are more than one rows on the tb that share the same id. Ideally I'd like the output as ID|ids1, ids2,ids3,ids4

Code:

<?php
include("db.php");
$lines = file("geneid.txt");
?>

<table>
<th>geneid</th><th>swissprot acc</th>

<?
foreach($lines as $line){  
$id = trim($line);
$query="SELECT * FROM `cheese` WHERE `geneid` = $id";

$ids=@mysql_result(mysql_query($query),0,"swissprot");

if (!$ids == "") { 
echo "<tr><td>$line</td><td>$ids</td></tr>";
}else{
echo "<tr><td>$line</td><td><i><u>unknown</u></i></td></tr>";
}
}
?> 
</table>

meghatronic · 1 Dec 2006 at 13:29

Still can't get the above post to work...

I'm not sure I'm nesting the queries right?

Basically, I have a db:

1 2
a milk
b eggs
b chocs
b poo
c oranges

and I'd like to output:

a - milk
b - eggs chocs poo
c - oranges

Atm, all I can get is either "b- eggs" or "b - poo" or, "b-eggs, b-chocs, b-poo".

The db is much more complex than that, but that's the jist of it. I want to go from multiple rows of results, to just the one...?

LazyManc · 1 Dec 2006 at 21:18

Try using mysql_fetch_row in a loop...

Code:

$result=mysql_query($query);

while($row = mysql_fetch_row($result)){
    echo 'First field: '.$row[0]; 
    echo 'Second field: '.$row[1];
}

or replace mysql_fetch_row with mysql_fetch_assoc and you can just use $row['column_name'].

LazyManc · 1 Dec 2006 at 21:27

PS - remember, the PHP docs are excellent and your friend. You don't need to know each function-name and what it does off by heart, just learn to use the docs well and you'll naturally pick up useful techniques for doing generic tasks.

meghatronic · 1 Dec 2006 at 22:49

LazyManc said:
PS - remember, the PHP docs are excellent and your friend. You don't need to know each function-name and what it does off by heart, just learn to use the docs well and you'll naturally pick up useful techniques for doing generic tasks.

Oh definately. And I do use them.. it justs stumps me when I have seemingly simple problems! :mad:

Like now for instance!

meghatronic · 2 Dec 2006 at 00:37

Sorted it.

How? By rewriting it backwards from the ground up.. I figure that is the only way to sort the nesting out.